Practical Adversarial Combinatorial Bandit Algorithm via Compression of Decision Sets
نویسندگان
چکیده
We consider the adversarial combinatorial multi-armed bandit (CMAB) problem, whose decisionset can be exponentially large with respect to the number of given arms. To avoid dealing with suchlarge decision sets directly, we propose an algorithm performed on a zero-suppressed binary decisiondiagram (ZDD), which is a compressed representation of the decision set. The proposed algorithmachieves either O(T ) regret with high probability or O(√T ) expected regret as the any-timeguarantee, where T is the number of past rounds. Typically, our algorithm works efficiently forCMAB problems defined on networks. Experimental results show that our algorithm is applicableto various large adversarial CMAB instances including adaptive routing problems on real-worldnetworks.
منابع مشابه
Stochastic and Adversarial Combinatorial Bandits
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting, we first derive problemspecific regret lower bounds, and analyze how these bounds scale with the dimension of the decision space. We then propose COMBUCB, algorithms that efficiently exploit the combinatorial structure of the problem, and derive finitetime upper bound on thei...
متن کاملCombinatorial Bandits Revisited
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret....
متن کاملOnline combinatorial optimization with stochastic decision sets and adversarial losses
Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable c...
متن کاملEfficient Algorithms for Adversarial Contextual Learning
We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner k...
متن کاملMore Adaptive Algorithms for Adversarial Bandits
We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret bounds improving previous work. Examples include: 1) a regret bound depending on the variance of only the best arm; 2) a regret bound depending on the first-order...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.08300 شماره
صفحات -
تاریخ انتشار 2017